Closure Table
Let's explore another way to solve Naive Trees antipattern.
The Closure Table solution is a simple and elegant way of storing hierarchies. It involves storing all paths through the tree, not just those with a direct parent-child relationship.
Creating TreePaths
table with plain Comments
table#
To employ the Closure Table solution, we can create another table, TreePaths
, in addition to a plain Comments
table, having two columns, each of which is a foreign key to the Comments
table.
Then, instead of using the Comments
table to store information about the tree structure, we can use the TreePaths
table.
Store one row in this table for each pair of nodes in the tree that shares an ancestor/descendant relationship, even if they are separated by multiple levels in the tree. In addition, add a row for each node to reference itself. Check the illustration below to see how the nodes are paired.
Finding out ancestors of a comment#
To retrieve the ancestors of comment 6, match the rows in TreePaths
whose descendant
is 6:
Inserting a new leaf node#
To insert a new leaf node — for instance, a new child of comment 5 — first insert the self-referencing row. Then add a copy of the set of rows in TreePaths
that reference comment 5 as a descendant
(including the row in which comment 5 references itself), replacing “descendant
” with the number of the new comment. See the working code in the following playground, which we can also change for experimentation:
Deleting a leaf node#
To delete a leaf node — for instance, comment 7 — delete all rows in TreePaths
that reference comment 7 as a descendant
:
Let’s run the query given above in the code widget below and retrieve the data to see the changes.
/
- main.sql
We can delete another leaf node by making the relevant modifications in the playground above.
Deleting a complete subtree#
To delete a complete subtree — for instance, comment 4 and its descendants — we can delete all the rows in TreePaths
that reference comment 4 as a descendant, as well as all rows that reference any of comment 4’s descendants as descendants:
Let’s run this query in the next playground and see the effect in the database.
/
- main.sql
Notice that deleting rows in TreePaths
doesn’t delete the comments themselves. This seems odd in our example (Comments
), but it makes more sense if we’re working with other kinds of trees — for instance, categories in a product catalog or employees in an organization chart. We don’t necessarily want to delete a node when we change its relationship to other nodes. When we store paths in a separate table, it helps make this more flexible.
Moving a subtree#
To move a subtree from one location in the tree to another, first, disconnect the subtree from its ancestors by deleting rows that reference the ancestors of the top node in the subtree and the descendants of that node. For instance, to move comment 6 from its position as a child of comment 4 to a child of comment3, start with the following deletion. Make sure not to delete comment 6’s self-reference.
By selecting the ancestors of comment 6 but not comment 6 itself, and the descendants of comment # 6 together with comment 6, we can correctly remove all the paths from comment 6’s ancestors to comment # 6 and its descendants. In other words, this deletes the paths (1, 6)
, (1,7)
, (4, 6)
, and (4, 7)
. It does not delete (6, 6)
or (6, 7)
.
Then, we can add the orphaned subtree by inserting rows matching the ancestors of the new location and the descendants of the subtree. We can use the CROSS JOIN
syntax to create a Cartesian product, generating the rows needed to match ancestors of the new location to all the nodes in the subtree we need to move.
Let’s run the queries in the following playground for moving a subtree and see the effect on the database.
/
- main.sql
This creates new paths using the ancestors of comment 3, including comment 3, and the descendants of comment # 6, including comment # 6. So, the new paths are (1, 6)
, (2, 6)
, (3, 6)
, (1, 7)
, (2, 7)
, and (3, 7)
. The result is that the subtree starting with comment 6 is relocated as a child of comment 3. The cross join creates all the needed paths, even if the subtree is moved to a higher or lower level in the tree.
Traits of Closure Table design#
The Closure Table design is more straightforward than the Nested Sets design. Both have quick and easy methods for querying ancestors and descendants, but the Closure Table is easier to use to maintain hierarchy information. In both designs, it’s more convenient to query immediate child or parent nodes than in the Adjacency List or Path Enumeration designs.
However, we can improve the Closure Table to make queries for immediate parent or child nodes easier.
Add a TreePaths.path_length
attribute to the Closure Table design. The path_length
of a node’s self-reference is zero, the path_length
of its immediate child is 1, the path_length
of its grandchild is 2, and so on. Finding the children of comment # 4 is now straightforward:
The query shows the children of node 4. We can also check it for any other node present in the database.
In this way, we can easily access the immediate children of a node.